Skip to content

Address ernie-image review findings #13577#13663

Open
akshan-main wants to merge 4 commits intohuggingface:mainfrom
akshan-main:ernie-image-review-fixes
Open

Address ernie-image review findings #13577#13663
akshan-main wants to merge 4 commits intohuggingface:mainfrom
akshan-main:ernie-image-review-fixes

Conversation

@akshan-main
Copy link
Copy Markdown
Contributor

What does this PR do?

Partial fix for #13577. Addresses 1, 2, 5 per @yiyixuxu's scope

  • (1) Switch ErnieImageAutoPromptEnhancerStep to ConditionalPipelineBlocks so use_pe=False actually skips the prompt enhancer (AutoPipelineBlocks selected on presence, not truthiness).
  • (2) Align modular VAE BN epsilon to the standard pipeline's hardcoded 1e-5 (matches training; the hub config currently reports 1e-4).
  • (5) Restructure output_type=\"latent\" so it runs maybe_free_model_hooks() and honors return_dict, matching the QwenImage/Flux2 pattern.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you read our philosophy doc (important for complex PRs)?
  • Was this discussed/approved via a GitHub issue? ernie-image model/pipeline review #13577
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@yiyixuxu

@akshan-main
Copy link
Copy Markdown
Contributor Author

Comment on lines 54 to 58
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also take the opportunity to change the auto classes for the real classes, it gets confusing for some users that pass the text encoder e.g. quantization and also kind of annoying to get the warning all the time.

@github-actions github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 30, 2026
@akshan-main
Copy link
Copy Markdown
Contributor Author

@asomoza switched text_encoder to Mistral3Model and pe to Ministral3ForCausalLM in both the standard and modular pipelines. Left the tokenizers as AutoTokenizer since mistral doesn't have a model-specific tokenizer class

@akshan-main akshan-main requested a review from asomoza April 30, 2026 16:29
Comment on lines +368 to +369
bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device)
bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype casting to be safe and for consistency with modular

Suggested change
bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device)
bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device)
bn_mean = self.vae.bn.running_mean.view(1, -1, 1, 1).to(device=device, dtype=latents.dtype)
bn_std = torch.sqrt(self.vae.bn.running_var.view(1, -1, 1, 1) + 1e-5).to(device=device, dtype=latents.dtype)

There could be a TODO regarding vae.config.batch_norm_eps, it should be used in the future if the checkpoint config is changed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Comment on lines +379 to +383
images = (images.clamp(-1, 1) + 1) / 2
images = images.cpu().permute(0, 2, 3, 1).float().numpy()

if output_type == "pil":
images = [Image.fromarray((img * 255).astype("uint8")) for img in images]
if output_type == "pil":
images = [Image.fromarray((img * 255).astype("uint8")) for img in images]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can VaeImageProcessor be used here? cc @yiyixuxu Enforcing VaeImageProcessor could be another agent review rule?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched both standard and modular to VaeImageProcessor.postprocess. Also fixes output_type="pt" in the standard pipeline (was returning numpy).

@github-actions github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 30, 2026
@github-actions github-actions Bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 30, 2026
@akshan-main akshan-main requested a review from hlky April 30, 2026 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants